The R package otu2ot for implementing the entropy decomposition of nucleotide variation in sequence data
نویسندگان
چکیده
Oligotyping is a novel, supervised computational method that classifies closely related sequences into "oligotypes" (OTs) based on subtle nucleotide variation (Eren et al., 2013). Its application to microbial datasets has helped reveal ecological patterns which are often hidden by the way sequence data are currently clustered to define operational taxonomic units (OTUs). Here, we implemented the OT entropy decomposition procedure and its unsupervised version, Minimal Entropy Decomposition (MED; Eren et al., 2014c), in the statistical programming language and environment, R. The aim of this implementation is to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework. In addition, two complementary approaches are implemented: (1) An analytical method (the broken stick model) is proposed to help identify OTs of low abundance that could be generated by chance alone and (2) a one-pass profiling (OP) method, to efficiently identify those OTUs whose subsequent oligotyping would be most promising to be undertaken. These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible. The package and procedures are illustrated by several tutorials and examples.
منابع مشابه
Single Nucleotide Polymorphisms and Association Studies: A Few Critical Points
Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...
متن کاملThe vlhA gene sequencing of Iranian Mycoplasma synoviae isolates
Mycoplasma synoviae expressed variable lipoprotein haemagglutinin (VlhA) is believed to play a major role in pathogenesis of the disease by mediating adherence and immune evasion. The aim of this study was sequencing Iranian M. synoviae isolates for the detection of nucleotide variation in the M. synoviae vlhA gene. Using oligonucleotide primers complementary to the single-copy conserved 5´ end...
متن کاملIntraspecies Gene Variation within Putative Epitopes of Immunodominant Protein P48 of Mycoplasma agalactiae
P48 protein of Mycoplasma agalactiae is used to diagnose infection and was identified as potential vaccine candidate. According to the genetic nature of mycoplasma and variable sensitivity in P48-based serological diagnosis tests, intra species variation of P48 nucleotide sequence investigated in 13 field isolates of difference province of Iran along with three vaccine strains. Samples were col...
متن کاملStudy on Genetic Diversity of Terminal Fragment Sequence of Isolated Persian Tobacco Mosaic Virus
Tobacco mosaic virus (TMV) is one of the devastating plant viruses in the world that infects more than 200 plant species. Movement protein plays a supportive role in the movement of other plant viruses, and viral coat protein is highly expressed in infected plants and affects replication and movements of TMV. In order to investigate genetic variation in the terminal fragment sequence in Iranian...
متن کاملPathogenicity and haemagglutinin gene sequence analysis of Iranian avian influenza H9N2 viruses isolated during (1998–2001)
Sixteen avian influenza (AI) H9N2 viruses were isolated from disease outbreaks in different parts of Iranduring (1998–2001). These AI isolates were used for pathogenicity, haemagglutinin (HA) gene variation andphylogenetic analysis. Results in both pathogenicity tests and HA gene cleavage site sequence detectionrepresented a non-highly pathogenic feature for all Iranian AI isolates studied. The...
متن کامل